accelerating deep learning
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Ferrandi, Fabrizio, Curzel, Serena, Fiorin, Leandro, Ielmini, Daniele, Silvano, Cristina, Conti, Francesco, Burrello, Alessio, Barchi, Francesco, Benini, Luca, Lavagno, Luciano, Urso, Teodoro, Calore, Enrico, Schifano, Sebastiano Fabio, Zambelli, Cristian, Palesi, Maurizio, Ascia, Giuseppe, Russo, Enrico, Petra, Nicola, De Caro, Davide, Di Meo, Gennaro, Cardellini, Valeria, Filippone, Salvatore, Presti, Francesco Lo, Silvestri, Francesco, Palazzari, Paolo, Perri, Stefania
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms.
Accelerating Deep Learning by Focusing on the Biggest Losers
Jiang, Angela H., Wong, Daniel L. -K., Zhou, Giulio, Andersen, David G., Dean, Jeffrey, Ganger, Gregory R., Joshi, Gauri, Kaminksy, Michael, Kozuch, Michael, Lipton, Zachary C., Pillai, Padmanabhan
This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02--1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples.
Accelerating Deep Learning with GPUs - Minds Mastering Machines [M³] London
This talk will cover how to accelerate deep learning with GPUs. GPUs have an architecture that is well-adapted to speeding up the massive parallel array calculations at the heart of deep learning. Today, manufacturers like NVIDIA are releasing GPUs with deep learning-specific features to further speed up model training and improve the throughput of deployed models. Installing and deploying GPU accelerated code can be challenging, so Anaconda has curated popular deep learning frameworks and packed them with GPU acceleration in the Anaconda Distribution. There they can be combined with Python packages like Pandas, Dask, and Jupyter to power data science experiments and production deployments.
Accelerating deep learning to superhuman proportions - Enterprise IT Watch Blog
Deep learning delivers extraordinary cognitive powers in the never-ending battle to distill sense from data at ever larger scales. But high performance doesn't come cheap. Deep learning relies on the application of multilevel neural-network algorithms to high-dimensional data objects. As such, it requires that fast-matrix manipulations in highly parallel architectures in order to identify complex, elusive patterns--such as objects, faces, voices, threats, etc.–amid big data's "3 V" noise. As evidence for the technology's increasingly superhuman cognitive abilities, check out research projects such as this that use it to put the Turing test to shame.
Best of the web: Artificial Intelligence news for November 3, 2016
Economists have become increasingly interested in studying the nature of production functions in social policy applications, with the goal of improving productivity. Traditionally models have assumed workers are homogenous inputs. However, in practice, substantial variability in productivity means the marginal productivity of labor depends substantially on which new workers are hired--which requires not an estimate of a causal effect, but rather a prediction. Annoyed at being automatically tagged with Facebook's facial-recognition system? Wearing a pair of tie-dye-looking glasses could help.
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from …
For more information about embedded vision, please visit: http://www.embedded-vision.com Bill Jenkins, Senior Product Specialist for High Level Design Tools at Intel, presents the "Accelerating Deep Learning Using Altera FPGAs" tutorial at the May 2016 Embedded Vision Summit. While large strides have recently been made in the development of high-performance systems for neural networks based on multi-core technology, significant challenges in power, cost and, performance scaling remain. Field-programmable gate arrays (FPGAs) are a natural choice for implementing neural networks because they can combine computing, logic, and memory resources in a single device. Intel's Programmable Solutions Group has developed a scalable convolutional neural network reference design for deep learning systems using the OpenCL programming language built with our SDK for OpenCL.